摘要 :
In the research literature on survey methodology, there is considerable discussion of interviewer effects and how to prevent data fabrication; however, there is little discussion on the detection of data fabrication by interviewer...
展开
In the research literature on survey methodology, there is considerable discussion of interviewer effects and how to prevent data fabrication; however, there is little discussion on the detection of data fabrication by interviewers in published data, and there are even fewer papers examining the phenomenon of employees of survey research organizations fabricating data. Among them, Blasius and Thiessen (2015) show for the PISA 2009 principal data that employees of survey research organizations in some countries duplicate cases to generate data. While the authors focus on exact copies, more sophisticated data fabrication techniques might include duplicating whole cases and changing a few entries afterwards. By calculating Hamming distances and applying them to the same data, we show that – in some countries in particular – large parts of the data have been duplicated, and most of them have been retrospectively modified to a small degree.
收起
摘要 :
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explici...
展开
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are required for configuring and running distributed computing pipelines—what we call flows—that link instruments, computers (e.g., for analysis, simulation, artificial intelligence [AI] model training), edge computing (e.g., for analysis), data stores, metadata catalogs, and high-speed networks. We review common patterns associated with such flows and describe methods for instantiating these patterns. We present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages powerful computers for data inversion,model training, or other purposes. We also discuss implications of such methods for operators and users of scientific facilities.
收起
摘要 :
Research data are fragile and subject to classical measurement error as well as to the risk of manipulation. This also applies to survey data which might be affected by deviant behavior at different stages of the data collection p...
展开
Research data are fragile and subject to classical measurement error as well as to the risk of manipulation. This also applies to survey data which might be affected by deviant behavior at different stages of the data collection process. Assuring data quality requires focusing on the incentives to which all actors in the process are exposed. Relevant actors and some specific incentives are presented. The role of data based methods for detection of deviant behavior is highlighted as well as limitations when actors are aware of them. Conclusions are drawn on how settings can be improved to provide positive incentives. Furthermore, it is stressed that a proper documentation of data quality issues in survey data is required both in order to increase trust in the data eventually used for analysis and to provide input for the development of new methods for detection of deviant behavior.
收起
摘要 :
Purpose - The purpose of this paper is to describe a unique approach to investigate the wrinkle force of textile structures in a cylindrical model.
Design/methodology/approach - In this research, an apparatus was designed and con...
展开
Purpose - The purpose of this paper is to describe a unique approach to investigate the wrinkle force of textile structures in a cylindrical model.
Design/methodology/approach - In this research, an apparatus was designed and constructed in order to investigate the torsional and wrinkle behavior of textile structures in a cylindrical model under a different rotational level using data acquisition and micro-controller systems. Findings - In the light of research results, the fiber and fabric type, fabric physical and mechanical properties and imposed rotational level significantly contributed to wrinkle characteristics of worsted fabrics. It was noticed that with increase of rotational level, the wrinkle force, and energy increased along weft and warp directions. Wrinkle characteristics along warp direction exhibited greater values . than in weft direction.
Originality/value - The study is aimed at determining wrinkle behavior of worsted fabrics under the combined influences of compression and torsional strains.
收起
摘要 :
Weaving is one of the most popular fabric manufacturing techniques. The weaving process consists of 3 major stages: warping, sizing, and weaving. The weaving factory henceforth involves a lot of data. But unfortunately, there is n...
展开
Weaving is one of the most popular fabric manufacturing techniques. The weaving process consists of 3 major stages: warping, sizing, and weaving. The weaving factory henceforth involves a lot of data. But unfortunately, there is no attempt to utilize machine learning or data science in weaving production. Although a variety of scopes are there to implement statistical analysis, data science, and machine learning. The dataset was prepared by using the daily production report for 9 months. The final dataset contains 121,148 data with 18 parameters. Whereas the raw data contains the same number of entries with 22 columns. The raw data needs substantial work to combine the daily production report, treat the missing values, rename columns, and feature engineering to derive EPI, PPI, warp, weft count values, etc. The complete dataset is stored at https://data.mendeley.com/datasets/nxb4shgs9h/1. It is further processed to get the rejection dataset which is stored at https://data.mendeley.com/datasets/6mwgj7tms3/2. The future implementation of the dataset is to predict the weaving waste, investigate the statistical relations among various parameters, production prediction, etc.
收起
摘要 :
To enable globally distributed computing for a large HEP experiment, a collection of computing and data storage facilities, together called the Grid fabric, must be linked together in a coherent way. The standard Grid software, in...
展开
To enable globally distributed computing for a large HEP experiment, a collection of computing and data storage facilities, together called the Grid fabric, must be linked together in a coherent way. The standard Grid software, including most notably the Globus Gatekeeper and Meta Directory Service, provides core tools to insert a site into the Grid and for its low-level monitoring. In practice, large experiments have data and job handling infrastructures that are not governed by the core tools. For example, local job submission is seldom done directly to the batch system, but rather, through an interface that allows for pre-submission steps (such as the decomposition of a job into smaller chunks) or is tightly integrated with a data handling system such as SAM. Likewise, monitoring is seldom done in terms of individual processors or individual jobs, but rather, via cluster-wide aggregated characteristics. In this paper, we present some of the work we have done to abstract the management of the fabric facilities of the FNAL Run Ⅱ experiments, in order to enable globally distributed computing.
收起
摘要 :
While updating a systematic review on the topic of ovulation of induction, we observed unusual similarities in a number of randomised controlled trials (RCTs) published by two authors from the same institute in the same disease sp...
展开
While updating a systematic review on the topic of ovulation of induction, we observed unusual similarities in a number of randomised controlled trials (RCTs) published by two authors from the same institute in the same disease spectrum in a short period of time. We therefore undertook a focused analysis of the data integrity of all RCTs published by the two authors. We made pairwise comparisons to find identical or similar values in baseline characteristics and outcome tables between trials. We also assessed whether baseline characteristics were compatible with chance, using Monte Carlo simulations and Kolmogorov-Smirnov test.
收起
摘要 :
This paper considers the requirements for a scalable, easily manageable, fault-tolerant, and efficient data center network fabric. Trends in multi-core processors, end-host vir-tualization, and commodities of scale are pointing to...
展开
This paper considers the requirements for a scalable, easily manageable, fault-tolerant, and efficient data center network fabric. Trends in multi-core processors, end-host vir-tualization, and commodities of scale are pointing to future single-site data centers with millions of virtual end points. Existing layer 2 and layer 3 network protocols face some combination of limitations in such a setting: lack of scalability, difficult management, inflexible communication, or limited support for virtual machine migration. To some extent, these limitations may be inherent for Ethernet/IP style protocols when trying to support arbitrary topologies. We observe that data center networks are often managed as a single logical network fabric with a known baseline topology and growth model. We leverage this observation in the design and implementation of PortLand, a scalable, fault tolerant layer 2 routing and forwarding protocol for data center environments. Through our implementation and evaluation, we show that PortLand holds promise for supporting a "plug-and-play" large-scale, data center network.
收起
摘要 :
Highlights ? A new tool to detect money laundering criminals is proposed. ? Benford’s Law and Machine Learning are combined to find patterns of money laundering. ? The tool is tested in the context of a real macro-case on money l...
展开
Highlights ? A new tool to detect money laundering criminals is proposed. ? Benford’s Law and Machine Learning are combined to find patterns of money laundering. ? The tool is tested in the context of a real macro-case on money laundering. ? Additional suspicious companies are identified. Abstract Objectives This paper is based on the analysis of the database of operations from a macro-case on money laundering orchestrated between a core company and a group of its suppliers, 26 of which had already been identified by the police as fraudulent companies. In the face of a well-founded suspicion that more companies have perpetrated criminal acts and in order to make better use of what are very limited police resources, we aim to construct a tool to detect money laundering criminals. Methods We combine Benford’s Law and machine learning algorithms (logistic regression, decision trees, neural networks, and random forests) to find patterns of money laundering criminals in the context of a real Spanish court case. Results After mapping each supplier’s set of accounting data into a 21-dimensional space using Benford’s Law and applying machine learning algorithms, additional companies that could merit further scrutiny are flagged up. Conclusions A new tool to detect money laundering criminals is proposed in this paper. The tool is tested in the context of a real case.
收起